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Sequence Analysis of the Nucleocapsid Protein Gene of Human Coronavirus 229E 
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Human coronaviruses are important human pathogens and have also been implicated in multiple sclerosis. To further 
understand the molecular biology of human coronavirus 229E (HCV-229E), molecular cloning and sequence analysis 
of the viral RNA have been initiated. Following established protocols, the 3'-terminal 1732 nucleotides of the genome 
were sequenced. A large open reading frame encodes a 389 amino acid protein of 43,366 Da, which is presumably the 
nucleocapsid protein. The predicted protein is similar in size, chemical properties, and amino acid sequence to the 
nucleocapsid proteins of other coronaviruses. This is especially evident when the sequence is compared with that of 
the antigenically related porcine transmissible gastroenteritis virus (TGEV), with which a region of 46% amino acid 
sequence homology was found. Hydropathy profiles revealed the existence of several conserved domains which could 
have functional significance. An intergenic consensus sequence precedes the 5'-end of the proposed nucleocapsid 
protein gene. The consensus sequence is present in other coronaviruses and has been proposed as the site of binding 
of the leader sequence for mRNA transcriptional start. This region was also examined by primer extension analysis of 
mRNAs, which identified a 60-nucleotide leader sequence. The 3'-noncoding region of the genome contains an 11- 
nucleotide sequence, which is relatively conserved throughout the Coronavirus family and lends support to the theory 
that this region is important for the replication of negative-strand RNA. © isas Academic Press, ine. 


INTRODUCTION 

Human coronavirus 229E (HCV-229E) belongs to 
one of two major antigenic groups of human coronavi¬ 
ruses (MacNaughton, 1981). It shares antigenic rela¬ 
tionships with other coronaviruses, such as porcine 
transmissible gastroenteritis virus (TGEV), feline infec¬ 
tious peritonitis virus (FIPV), and canine coronavirus 
(CCV). The other well-characterized human coronavi¬ 
rus, HCV-OC43, is in a separate antigenic group which 
includes mouse hepatitis virus (MHV) and bovine coro¬ 
navirus (BCV). Both human coronaviruses are mainly 
respiratory pathogens and have been estimated to 
cause up to 25% of common colds (McIntosh et al., 
1974; Wege et at., 1982). They have also been impli¬ 
cated in gastrointestinal diseases (Resta etal., 1985). 
Furthermore, the isolation of coronaviruses bearing an 
antigenic relationship to HCV-OC43 from the central 
nervous system of two patients with multiple sclerosis 
has suggested a possible etiologic relationship be¬ 
tween human coronaviruses and multiple sclerosis 
(Burks et al., 1980). This possibility is supported by the 
observation that neurotropic strains of MHV cause de- 
myelination in the central nervous system of rodents 
(Weiner and Stohlman, 1978). Thus, human coronavi¬ 
ruses are important human pathogens. 

The structural and biochemical properties of several 
coronaviruses, particularly MHV and avian infectious 
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peritonitis virus (IBV), have been well characterized (Lai 
etal., 1987; Boursnell et a/., 1987). The virion contains 
a single-stranded, positive-sense RNA molecule (mo¬ 
lecular weight 6-8 X 10 6 Da) (Lai and Stohlman, 1978) 
associated in a helical conformation with nucleocapsid 
proteins (N). The viral nucleocapsid is enclosed by an 
envelope, in which are embedded at least two types of 
viral proteins, the peplomer (E2) and matrix (El) glyco¬ 
proteins. Coronavirus RNA replication occurs in the cy¬ 
toplasm of infected cells and is mediated by a virus- 
encoded RNA-dependent RNA polymerase (Brayton et 
al., 1982). The virus-specific mRNA in infected cells 
comprises a genomic-sized RNA plus six subgenomic 
mRNA species. These mRNAs are arranged in a 
nested-set structure, which is characterized by RNAs 
having common 3'-termini but extending for varying 
lengths in the 5' direction (Lai et al., 1981). Only the 5'- 
proximal regions of each mRNA are translated (Rottier 
et al., 1981). A unique feature of the structure of coro¬ 
navirus is the existence, at the 5'-end of each mRNA, of 
an identical leader sequence. This sequence is derived 
from the 5'-end of the genomic RNA and is of approxi¬ 
mately 70 nucleotides in length (Lai etal., 1983, 1984). 
Recent evidence has supported a role for the leader 
sequence in mediating a novel type of discontinuous 
transcription of genomic RNA (Baric et al., 1985; Ma- 
kinoefa/., 1986; Shieh et al., 1987). 

In contrast to other coronaviruses, the molecular bi¬ 
ology of human coronaviruses is relatively poorly un¬ 
derstood. The genomic RNA of both HCV-229E and 
HCV-OC43 has a molecular weight of approximately 6 
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X 10 6 Da (Hierholzerefa/., 1981). The six subgenomic 
RNA species appear to have lower molecular weights 
than those of the corresponding MHV RNAs (Weiss 
and Leibowitz, 1981). The structure of these mRNAs is 
not yet known. Analysis of purified HCV-229E virions 
has revealed three major polypeptides: a glycosylated 
protein with a molecular weight of 180 kDa, a phos- 
phorylated nucleocapsid protein of 50 kDa, and a fam¬ 
ily of polypeptides with molecular weights of 25, 23, 
and 21 kDa (Kemp eta/., 1984). In addition, several mi¬ 
nor nonstructural polypeptides of 107, 92, and 39 kDa 
have been identified (Kemp et al., 1984). The functions 
of these proteins have not yet been characterized. 

To further understand the molecular biology of HCV- 
229E, we have initiated molecular cloning and se¬ 
quence analysis of HCV-229E RNA. In this paper we 
report the sequence analysis of the gene encoding the 
nucleocapsid protein of HCV-229E. In addition, the 
mRNA leader sequence was also identified. The re¬ 
sults are compared with sequences of other coronavi- 
ruses including MHV, BCV, IBV, and TGEV. 

MATERIALS AND METHODS 
Virus and cells 

HCV-229E (obtained from Dr. J. Fleming, University 
of Southern California) was propagated at low multiplic¬ 
ities of infection in human fetal lung cells LI 32 (Ken¬ 
nedy and Johnson-Lussenberg, 1975/1976), using 
Dulbecco's modified Eagle’s medium (DMEM) supple¬ 
mented with 10% fetal calf serum. 

Virus purification and preparation of virion RNA 

Following a virus adsorption period of 1 hr at 37°, 
HCV-229E-infected LI 32 monolayers were incubated 
at 37° for 24 to 48 hr, at which time the cell culture fluid 
was harvested. Viruses were precipitated from 2 liters 
of culture fluid with 50% ammonium sulfate and centri¬ 
fuged at 8000 rpm for 30 min. The pellet was resus¬ 
pended in NTE buffer (0.1 /WNaCI, 0.01 /WTris-hydro- 
chloride (pH 7.2), 1 m M EDTA) and then placed on a 
discontinuous sucrose gradient consisting of 60, 50, 
30, and 20% (w/w) sucrose in NTE buffer and centri¬ 
fuged at 26,000 rpm for 13 hr at 4° in a Beckman 
SW28.1 rotor. The virus band at the interface between 
50 and 30% sucrose was collected and diluted three¬ 
fold with NTE buffer. The diluted virus suspension was 
centrifuged on a linear sucrose gradient at 26,000 rpm 
in an SW28.1 rotor for 4 hr at 4°. The virus band was 
collected and treated with proteinase K (0.2 mg/ml) for 
20 min at 37°, followed by 1% SDS for 30 min at 37°. 
Genomic RNA was extracted with phenol and then with 
phenol/chloroform, and precipitated with ethanol. 


Preparation of intracellular RNA 

Monolayers of LI 32 cells grown in 100 X 20-mm cul¬ 
ture dishes were infected with HCV-229E. Cells were 
incubated in phosphate-free DMEM containing 1 % dia¬ 
lyzed fetal calf serum 4 hr prior to RNA extraction. Acti- 
nomycin D (1 ,ug/ml) (Sigma) and [ 32 P]orthophosphate 
(70 /u.Ci/ml) (ICN Radiochemicals) were added at 3 and 
2 hr, respectively, prior to RNA extraction at 15 hr post¬ 
infection (p.i.). Cells were collected in cold phosphate- 
buffered saline and centrifuged at 5000 rpm for 3 min 
at 4°. The pellet was mixed with cold 0.5% Nonidet- 
P40 in NTE buffer, incubated for 10 min at 4°, and then 
centrifuged at 5000 rpm for 3 min. The supernatant 
was transferred to a fresh tube containing 1/10 vol of 
10% SDS at room temperature and vortexed briefly. In¬ 
tracellular RNA was extracted with phenol and phenol/ 
chloroform and precipitated with ethanol. Poly(A)-con- 
taining RNA was selected by oligo(dT)-cellulose chro¬ 
matography as previously described (Makino et al., 
1984). 

To examine the kinetics of viral mRNA synthesis, in¬ 
tracellular RNA was extracted from virus-infected LI 32 
monolayers in 60 X 15-mm culture dishes at 7, 21,29, 
46, and 58 hr postinfection. 

cDNA cloning 

cDNA cloning was performed using a modified 
method of Gubler and Hoffman (1983). The poly(A)- 
containing RNA extracted from 229E-infected LI 32 
monolayers was precipitated, dried, and resuspended 
in 6.72 pil of autoclaved water. The RNA was incubated 
with 10 m M methylmercuric hydroxide in an 8 total 
volume for 10 min at room temperature. First-strand 
cDNA synthesis was carried out in a 50-^1 reaction mix¬ 
ture containing 60 units RNasin (Promega Biotec), 10 
m M MgCl z , 100 m M KCI, 50 m/WTris-HCI (pH 8.3 at 
42°), lOm/WDTT, 1.25 m/WdNTPs, 40 fiC\ [«- 32 P]dATP 
(3000 Ci/mmol), 28 m/W/3-mercaptoethanol, and 10 ng 
oligo(dT) 12 - 1B primer. After 5 min at room temperature, 
40 units of AMV reverse transcriptase (Life Science) 
was added and the mixture was incubated for 1 hr at 
42°. The reaction was stopped by adding 4.4 n\ of 250 
m M EDTA. The products were extracted with phenol/ 
chloroform and precipitated with ethanol containing 
0.3 M ammonium acetate. For second-strand synthe¬ 
sis, the 100-Atl reaction mixture contained 5 m M 
MgCI 2 , 100 m M KCI, 20 m M Tris-HCI (pH 7.5), 50 fig/ 
ml bovine serum albumin (BSA), 10 m M ammonium 
sulfate, 0.15 m M /3-NAD, 100 fiM dNTPs, 25 units of 
Escherichia coli DNA polymerase I, 2 units of E. coli 
DNA ligase, and 0.8 units of RNase H. Sequential incu¬ 
bations were for 1 hr at 12° and 1 hr at 22°. The reac¬ 
tion was stopped by the addition of 8.7 jul of 250 m M 
EDTA and the products were extracted with phenol/ 
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chloroform and precipitated with ethanol in the pres¬ 
ence of 0.3 M ammonium acetate. Homopolymeric tail¬ 
ing of double-stranded cDNA with poly(C) was carried 
out in a 12-/il reaction mixture containing 10 units of 
terminal transferase, 200 m M potassium cacodylate, 
0.5 m/W CoCI 2 , 25 m/WTris-HCI (pH 6.9), 2 m M DTT, 
250 /ug/ml BSA, and 50 dCTP at 37° for 4 min. The 
dC-tailed double-stranded DNA was annealed to 200 
/Ltg of dG-tailed Psfl-cut pBR322 plasmid in 20 pi of a 
buffer containing 10 m/WTris-HCI (pH 7.4), 100 m/W 
NaCI, and 0.25 m/W EDTA. The mixture was incubated 
for 5 min at 68° and then cooled slowly overnight. The 
annealed molecules were used to transform E. coli 
MCI061 as described (Dagert and Erhlich, 1979). 

Colony hybridization 

Colonies grown on LB/tetracycline plates were incu¬ 
bated at 37° for 12 hr and transferred to Colony/Plaque 
Screen disks (New England Nuclear). Bacterial lysis 
and DNA fixation were carried out according to the 
methods previously described (Grunstein and Hog- 
ness, 1975). The disks were prehybridized in a solution 
containing 0.2% polyvinylpyrrolidone (MW 40,000), 
0.2% Ficoll (MW 400,000), 0.2% BSA, 0.05 /WTris-HCI 
(pH 7.5), 1%SDS, 1 /WNaCI, 10% dextran sulfate, and 
100 pg/ml denatured salmon sperm DNA at 65° for 6- 
hr. Fragments derived from either the 5'- or 3'-ends of 
gene 7 were labeled with 32 P by nick-translation and 
added to the solution. Hybridization was carried out for 
20 hr at 65°. The disks were then washed twice in 2X 
SSC (0.3 M NaCI, 30 m/W sodium citrate) at room tem¬ 
perature, twice in 2X SSC containing 1% SDS for 30 
min at 65°, and twice in 0.1X SSC at room temperature 
for 30 min. The disks were air-dried and exposed to X- 
ray film at -70°. 

Northern hybridization 

Intracellular RNA from virus-infected cells was dena¬ 
tured by glyoxal treatment and separated by electro¬ 
phoresis on a 1% agarose gel containing 10 m/W so¬ 
dium phosphate (pH 7.0) as described previously (Mc- 
Master and Carmichael, 1977). RNA transfer to 
Biodyne nylon filters (ICN Radiochemicals) and subse¬ 
quent hybridization were performed according to the 
method described by Thomas (1980). 

Primer extension 

A synthetic oligodeoxyribonucleotide was 5'-end-la- 
beled with [ 7 - 32 P]ATP by polynucleotide kinase (Peder¬ 
sen and Haseltine, 1980). The total amount of 
poly(A)-containing RNA extracted from 229E-infected 
cell monolayers in three 150 X 20-mm culture dishes 
was incubated in 8 pi of distilled water containing 10 
m/W methylmercuric hydroxide for 10 min at room tem¬ 


perature. A further incubation was carried out in a 50- 
pl reaction volume containing 60 units of RNasin (Pro- 
mega), 10 m/W MgCI 2 , 100 m/W KCI, 50 m/WTris-HCI 
(pH 8.3 at 42°), 10 m/W DTT, 1.25m M dNTPs, 28 m/W 
j8-mercaptoethanol, 5'-end-labeled synthetic oligo- 
deoxyribonucleotides, and 20 units of AMV reverse 
transcriptase (Life Science) for 1 hr at 42°. Reaction 
products were extracted with phenol/chloroform, pre¬ 
cipitated with ethanol, and then analyzed by electro¬ 
phoresis on a 6% polyacrylamide gel containing 8.3 M 
urea. The primer-extended product was identified by 
autoradiography and eluted from the gel according to 
the published procedure (Maxam and Gilbert, 1977). 

DNA sequencing 

Sequencing was carried out by the dideoxyribo- 
nucleotide chain termination method (Sanger et al., 
1977) as well as the chemical modification procedure 
(Maxam and Gilbert, 1977). In the first method, frag¬ 
ments of cDNA inserts generated by various restriction 
endonucleases were cloned into the Ml3 vectors 
mp18 and mp19 (Messing and Vierira, 1982). [a- 35 S]- 
dATP was used as a label. Sequence data were also 
obtained by chemical modification (Maxam and Gilbert, 
1977) of various cDNA fragments subcloned into the 
pT7-3 vector (Tabor and Richardson, 1985). In the sec¬ 
ond method, cDNA fragments were 3'-end-labeled with 
Klenow fragment at internal restriction sites or, alterna¬ 
tively, at the polylinker cloning site of pT7-3. End-la¬ 
beled cDNA restriction fragments were separated by 
electrophoresis on preparative polyacrylamide gels 
(Maxam and Gilbert, 1980) and purified as described 
previously (Hansen et al., 1980; Hansen, 1981). Se¬ 
quencing of the primer-extended product of mRNA7 
was performed by the chemical modification proce¬ 
dure (Maxam and Gilbert, 1977). Sequence analysis 
was performed by the Intelligenetics and Seqaid pro¬ 
grams. Hydropathy profiles were constructed using the 
PepPlot program of the University of Wisconsin Com¬ 
puter Genetics Group, which employs both the Kyle- 
Doolittle (KD) and Goldman, Engelman, Steitz (GES) al¬ 
gorithms. 

RESULTS 

Kinetics of HCV-229E mRNA synthesis 

To determine the optimum time for extracting 229E- 
specific mRNAs, we first studied the kinetics of virus- 
specific mRNA synthesis. Intracellular RNA was ex¬ 
tracted from infected LI 32 monolayers at specified 
times p.i. The RNA was separated by agarose gel elec¬ 
trophoresis (Fig. 1). As can be seen, viral mRNA syn¬ 
thesis could be detected as early as 7 hr p.i. and 
reached maximum at 29 hr p.i. Thereafter, total RNA 
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Fig. 1 . Kinetics of synthesis of HCV-229E-specific RNAs. Intracel¬ 
lular RNA from HCV-229E-infected LI 32 cell monolayers was la¬ 
beled with [ 32 P]orthophosphate, extracted, and separated by aga¬ 
rose gel electrophoresis as described under Materials and Methods. 
RNA was extracted at the following times: lane A, 7 hr p.i.; lane B, 
21 hr p.i.; lane C, 29 hr p.i.; lane D, 46 hr p.i.; lane E, 58 hr p.i. The 
positions and designations of HCV-229E-specific RNAs are indi¬ 
cated by the numbers on the left side of the figure. 

synthesis gradually declined. By 46 hr p.i. only the most 
abundant mRNA species were evident. The number 
and size of these mRNA species are comparable to 
those of MHV mRNAs and are in agreement with pre¬ 
viously published results (Weiss and Leibowitz, 1981). 
Significantly, mRNA 2a, which was previously found 
only in BCV-infected cells and proposed to encode 
hemagglutinins (King et al., 1985; Keck et al., 1988), 
was not present. This is consistent with the finding that 
HCV-229E does not have hemagglutinating activity 
(Hierholzer, 1976). The relative amounts of the mRNA 
species were the same throughout the replication cy¬ 
cle. Therefore, in all of our subsequent experiments, 
the virus-specific intracellular RNAs were extracted at 
15 hr p.i. 

Molecular cloning of HCV-229E genomic RNA and 
intracellular virus-specific mRNAs 

cDNA cloning was initially performed using virion ge¬ 
nomic RNA as a template. The sizes of inserts in the 


resultant cDNA clones ranged from 0.2 to 0.5 kb in 
length. One clone, A34, contained a 0.45-kb insert, 
which was subsequently characterized by restriction 
mapping and Northern blot analysis. The 0.45-kb frag¬ 
ment was labeled with 32 P by nick-translation and hy¬ 
bridized with intracellular RNA from 229E-infected 
cells. The result, shown in Fig. 2, revealed that the frag¬ 
ment hybridized to each of the mRNA species. This re¬ 
sult suggested that the HCV-229E subgenomic 
mRNAs possess a nested-set structure similar to other 
coronaviruses (Lai, 1988) and that A34 represented a 
cDNA clone of either the 3'-end of the genomic RNA or 
the leader sequence. 

Cloning was subsequently carried out using intracel¬ 
lular RNA from 229E-infected cells as a template. The 


A B 



Fig. 2. Northern blot analysis of HCV-229E-specific intracellular 
RNA hybridized with clone A34. Intracellular RNA from either unin¬ 
fected (lane A) or HCV-229E-infected (lane B) LI 32 monolayers was 
denatured by glyoxal treatment, separated on a 1 % agarose gel, and 
transferred to Biodyne nylon filters as described under Materials and 
Methods. The positions and designations of the HCV-229E-specific 
RNAs are indicated by the numbers on the right side of the figure. 
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Fig. 3. Diagram of the 3'-end of the HCV-229E genome, including cDNA clones and sequencing strategy, (a) Restriction map of the 3'-end 
cDNA clones with reference to the entire viral genome, (b) Relative position and size of the cDNA clones which were sequenced. Clones L8 
and L37 are shown in part. Clone A34 was used in colony hybridization studies, (c) Direction and extent of sequence data obtained from 
subcloned fragments. Arrows with solid squares indicate dideoxy sequencing method. Arrows with open circles indicate chemical modification 
sequencing method. Arrows with open diamonds indicate sequencing, by chemical modification, of fragments subcloned into the Psfl-Smsl 
sites of pT7-3 and 3'-end-labeled with Klenow fragment. The arrow with a closed circle indicates a DNA fragment which was labeled at the 3'- 
end using radiolabeled cordycepin and terminal transferase. Abbreviations: B, Bal I; E, fooRI; H, Mndlll; P, Psfl; R, Psal. 



resulting cDNA clones were screened by colony hy¬ 
bridization using the 0.45-kb fragment from clone A34 
as a nick-translated probe (Fig. 3). Several positive col¬ 
onies were identified and characterized further. Clone 
L8 contained a 3.6-kb insert but lacked a 3'-poly(A) tail. 
Clone L37, which contained an insert of 1.7 kb, over¬ 
lapped L8 but was 0.1 kb shorter at the 3'-end. This 
clone also lacked a poly(A) sequence (see below). 
Therefore, additional cDNA clones were isolated using 
a 0.24-kb Bal 1-EcoRI fragment of L8 (Fig. 3a) as a 
probe. These latter clones were further characterized 
by Southern blot analysis. Clone S10 contained an in¬ 
sert of 0.8 kb which overlapped the 3'-ends of the two 
previous clones and extended another 0.4 kb in that 


direction. Figure 3b shows the orientation and sizes of 
clones L8, L37, Si 0, and A34 with reference to the viral 
genome. Restriction enzyme sites used for sequencing 
are also shown. 

Sequencing of the cDNA clones 

To determine the sequence of the 3'-end of HCV- 
229E genome, various restriction fragments of L8, L37, 
and Si 0 were subcloned into Ml 3 vectors. ForL8, only 
the 1,2-kb fragment extending from an internal Psfl site 
toward the 3'-end was sequenced. Clone L37 was also 
sequenced in part. Figure 3c shows the cDNA frag¬ 
ments and strategy used in sequencing. Each region 



SEQUENCE ANALYSIS OF HCV-229E 


147 


5' -CGGAAGGTCCGTAAATTCACAAAATAGCACAGGCTGGGTTTTCTACGTACGAGTAAAACACGGTGATTTTTCTGCAGTGAGCTCTC 8 6 

MATVKWADAS 10 

8 7 r.r.ATGAGCAAGATGACAGAAAACGAAAGATTGCTTCATTTTTTCTAAACTGAACGAAAAGATGGCTACAGTCAAATGG GCTGATGCATP.T 176 

11 EPQRGRQGRIPYSLYSPLLVDSEQSWKVIP 40 

177 GAACCACA AC.GTGGTCGTCAGGGTAGAATAC CTTATTCTCTTTATAGCCCTTTGCTTGTTGATAGTGAACAATCTTGGAAGGTGATACCT 266 

41 RNLVPINKKDKNKLIGYWNVQKRFRTRKGK 70 

267 CGTAATCTGGTACCCATCAACAAGAAAGACAAAAATAAGCTTATAGGCTATTGGAATGTTCAAAAACGTTTCAGAACTAGAAAGGGCAAA 356 

71 RVDLSPKLHFYYLGTGPHKDAKFRERVEGV 100 

357 CGGGTGGATTTGTCACCCAAGCTGCATTTTTATTATCTTGGCACAGGACCCCATAAAGATGCAAAATTTAGAGAGCGTGTTGAAGGTGTC 446 

101 VWVAVDGAKTEPTGHGARRKNSEPEIPHFN 130 

447 GTCTGGGTTGCTGTTGATGGTGCTAAAACTGAACCTACAGGCCACGGCGCCAGGCGCAAGAATTCAGAACCAGAGATACCACACTTCAAT 536 

131 QKLPNGVTVVEEPDSRAPSRSQSRSQSRGP 160 

537 CAAAAGCTCCCAAATGGTGTTACTGTTGTTGAAGAACCTGACTCCCGTGCTCCTTCCCGGTCTCAGTCGAGGTCGCAGAGTCGCGGTCCT 626 

161 GESKPQSRNPSSDRYHNSQDDIMKAVAAAL 190 

627 GGTGAATCCAAACCTCAATCTCGGAATCCTTCAAGTGACAGATACCATAACAGTCAGGATGACATCATGAAGGCAGTTGCTGCGGCTCTT 716 

191 KSLGFDKPQEKDKKSAKTGTPKPSRNQSPA 220 

707 AAATCTTTAGGTTTTGACAAGCCTCAGGAAAAAGATAAAAAGTCAGCGAAAACGGGTACTCCTAAGCCTTCTCGTAATCAGAGTCCTGCT 806 

221 SSQTSAKSLARSQSSETKEQKHEIEKPRWK 250 

797 TCTTCTCAAACTTCTGCCAAGAGTCTTGCTCGTTCTCAGAGTTCTGAAACAAAAGAACAAAAGCATGAAATCGAAAAGCCACGGTGGAAA 896 

251 RQPNDDVTSNVTQCFGPRDLDHNFGSAGVV 280 

897 AGACAGCCTAATGATGATGTGACATCTAATGTCACACAATGTTTTGGCCCCAGAGACCTTGACCACAACTTTGGAAGTGCAGGTGTTGTG 986 

281 ANGVKAKGYPQFAELVPSTAAMLFDSHIVS 310 

987 GCCAATGGTGTTAAAGCTAAAGGCTATCCACAATTTGCTGAGCTTGTGCCGTCAACAGCTGCTATGCTGTTTGATAGTCACATTGTTTCC 1076 

311 KESGNTVVLTFTTRVTVPKDHPHLGKFLEE 340 

1077 AAAGAGTCAGGCAACACTGTGGTCTTGACTTTCACTACTAGAGTGACTGTGCCCAAAGACCATCCACACTTGGGTAAGTTTCTTGAGGAG 1166 

341 LNAFTREMQQQPLLNPSALEFNPSQTSPAT 370 

1167 TTAAATGCATTCACTAGAGAAATGCAACAACAGCCTCTTCTTAACCCTAGTGCACTAGAATTCAACCCATCTCAAACTTCACCTGCAACT 1256 

371 AEPVRDEFSIETDIIDEVNZ 389 

1257 GCTGAACCAGTGCGTGATGAATTTTCTATTGAAACTGACATAATTGATGAAGTAAACTAAACATGCCACTGTGTTGTTTGAAATTCAGGC 1346 

1347 TTTAGTTGGAATTTTGCTTTTGCTCTTGCTTTTATTATCTTTCTTTAATACATTGCTTTTCTCTGATCTATGTATGATGGTACGATCAGA 1436 

1437 GCTACTTTTAATTAACATGATCCCTTGCTTTGGCTTGATAAGGATCTAGTCTTATACACAATGGTAAGCCAGTGGTAGTAAAGGTATAAG 1526 

1527 AAATTTGCTACTATGTTACTGAACCTAGGTGAACGCTAGTATAACTCATTACAAATGTGCTGGAGTAATCAAAGATCGCATTGACGAGCC 1616 

1617 AACAATGGAAGAGCCAGTCATTTGTCTTGAGACCTATCTAGTTAGTAACTGCTAATGGAACGGTTTCGATATGGATACAC-POLY (A) -3' 1696 


Fig. 4. The primary nucleotide sequence of the 3'-end of HCV-229E RNA and the deduced amino acid sequence of the nucleocapsid protein. 
A primer extension study was carried out using a synthetic oligodeoxyribonucleotide complementary to an 18-mer sequence underlined near 
the 5'-end of the gene. The 3'-noncoding region contains a conserved sequence which is shown by the double line. The intergenic conserved 
sequence, TCTAAACT, is also shown (dotted line). 


was verified by dideoxy chain termination sequencing 
of both strands or by the chemical modification 
method. Clone Si 0 was found to have a poly(A) stretch 
of 34 bases. Figure 4 shows the complete DNA se¬ 
quence with a translation of the main open reading 
frame (ORF) in one-letter amino acid code. This ORF 
extends from base 147 to base 1313 and predicts a 
389 amino acid protein with a molecular weight of 
43,366 Da. This predicted molecular weight is slightly 
smaller than the measured molecular weight of the nu¬ 
cleocapsid protein of HCV-229E, which is 50 kDa as 
determined by SDS-polyacrylamide gel electrophore¬ 
sis (MacNaughton, 1980). The difference is probably 
due to phosphorylation or other modification of the pro¬ 
tein. The predicted protein shares features with the nu¬ 
cleocapsid proteins of TGEV, MHV, BCV, FICV-OC43, 
and IBV (Kapke and Brian, 1986; Skinner and Siddell, 


1984; Armstrong era/., 1983; Lapps ef at., 1987; Ka- 
mahora ef a/., 1988; Boursnell ef a/., 1985). Namely, 
the protein is highly basic and rich in serine residues. 
Sixty percent of the amino acid residues are basic and 
12% are acidic. There are 39 serine residues (10% of 
total), which are presumed to be sites of phosphoryla¬ 
tion (Stohlman and Lai, 1979). When compared to 
TGEV, with which FICV-229E shares antigenic proper¬ 
ties, both N proteins have identical amounts of basic 
and acidic amino acids and serine residues and similar 
molecular weights (Kapke and Brian, 1986). 

Figure 5 shows a schematic diagram of the possible 
ORFs obtained by translating the nucleotide sequence. 
The ORF in frame 3 is likely the one which encodes the 
nucleocapsid protein. In frame 2, the 5'-flanking region 
probably contains part of the sequence of the matrix 
protein encoded by gene 6. This possibility is sup- 
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Fig. 5. Schematic diagram of the possible open reading frames 
obtained when translating the primary nucleotide sequence. Vertical 
lines above the baseline represent potential initiation codons. Termi¬ 
nation codons are indicated by vertical lines below the baseline. 
Frame 3 depicts a single, long ORF encoding the nucleocapsid pro¬ 
tein. ORFs which are greater than 30 amino acids are also shown. 
Those lacking translation start sites are indicated by dashed lines. 


ported by the finding that reading frame 2 remains open 
at the extreme 5'-end. Furthermore, the sequence 
TCTAAACT, which is found in the intergenic regions of 
several other coronaviruses (Kapke and Brian, 1986; 
Skinner and Siddell, 1984; Armstrong et at., 1983; 
Lapps era/., 1987; Kamahora et at., 1988; Budzilowicz 
etal., 1985), is also present between the presumed ini¬ 
tiation codon of the main ORF and the 3'-end of gene 
6. This sequence is the proposed site of fusion of the 
leader sequence with the mRNA coding region (Shieh 
et at., 1987; Makino et at., 1986; Budzilowicz et at., 
1985). 

The 3'-noncoding region contains the sequence 
TGGAAGAGCCA, 75 nucleotides from the 3'-end (Fig. 
4), which is relatively conserved among coronaviruses 
and is found at approximately the same location in all 
of these viral genomes (Kapke and Brian, 1986; Skinner 
and Siddell, 1984; Armstrong etal., 1983; Lapps etal., 
1987; Kamahora era/., 1988; Boursnell et at., 1985) 
(Table 1). There is only one nucleotide difference in this 
conserved sequence when it is compared with that of 
TGEV, BCV, and HCV-OC43. Two and three nucleotide 
differences are found in IBV and MHV, respectively. 
This conservation of sequence and location suggests 
that it may be important for viral RNA replication. 

In frame 1, there are several additional ORFs of at 
least 30 amino acids. Some of these, including one 
found in the 3'-noncoding region, lack appropriate 
translation start sites. Another long internal ORF is 
found from base 322 through 693. This contains an 
appropriate initiation sequence and encodes a hypo¬ 
thetical protein of 13,974 Da, which is rich in leucine 
residues (17%). The significance of this ORF remains 
to be defined. 

Leader sequence of HCV-229E 

The mRNAs of coronaviruses contain a stretch of 
leader sequence which is derived from the 5'-end of the 


viral genome and exhibits homology with the intergenic 
consensus sequence (Shieh et at., 1987; Budzilowicz 
et at., 1985). Since our cDNA clones did not appear to 
contain leader sequences, we used primer extension 
studies to determine the sequence of the HCV-229E 
leader RNA. A synthetic oligodeoxyribonucleotide 
which was complementary to an 18-mer sequence lo¬ 
cated near the 5'-end of the gene (Fig. 4) was end-la¬ 
beled and used in a primer extension study with 
poly(A)-selected intracellular mRNA as a template. The 
reaction products, separated by agarose gel electro¬ 
phoresis, revealed six bands (data not shown). Since 
these bands were most likely to represent the primer- 
extended products of the individual mRNA species, the 
smallest and most abundant band, corresponding to 
the primer-extended product of mRNA7, was eluted 
and sequenced by the chemical modification method 
(Maxam and Gilbert, 1977). The sequence of the 3'-end 
of the primer-extended product was identical to the L8 
sequence from nucleotides 129 to 171. At nucleotide 
128, immediately 5' to the proposed leader mRNA fu¬ 
sion site, the sequence diverged from the L8 sequence 
and revealed a putative 60-base leader sequence 
which is shown in Fig. 6. The figure also shows a de¬ 
gree of homology with the leader sequence of IBV. 
Considerably less homology exists between the leader 
sequence of FICV-229E and those of FICV-OC43 and 
MFIV-JFIM (data not shown). 

DISCUSSION 

This report presented the primary sequence of the 
nucleocapsid gene and leader sequence of FICV-229E. 
When compared to the known sequences of other cor¬ 
onaviruses (Kapke and Brian, 1986; Skinner and Sid¬ 
dell, 1984; Armstrong et at., 1983; Lapps et at., 1987; 
Kamahora et at., 1988; Boursnell et at., 1985), com¬ 
mon features of coronavirus nucleocapsid proteins 
emerged; namely, they are highly basic and have a high 
proportion of serine residues, which have been shown 


TABLE 1 


Conserved Sequence atthe 3'-Noncoding Region of Coronavirus 


Virus 

3' conserved sequence 


HCV-229E 

TGGAAGAGCCA 

(75) 

TGEV 

TGGAAGAGCTA 

(76) 

BCV 

GGGAAGAGCCA 

(79) 

HCV-OC43 

GGGAAGAGCCA 

(79) 

IBV 

GGGAAGAGCTA 

(81) 

MHV-JHM 

GGGAAGAGCTC 

(82) 

MHV-A59 

GGGAAGAGCTC 

(82) 


Note. Number in parenthesis indicates distance, in nucleotides, of 
the conserved sequence from the poly(A) region. 
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III II I 

HCV-22 9E 5 1 -CTTAAG * TACCTTAT *CTATCTA* C AAATAGAAAAG * *TTGCTTTTTAGACTTTGTGTC *TA*CTTC 

IBV 5 ' -ACTTAAGATAGATATTAATATATATCTATTACACTAGCCTTGC * *gctagatttttaa*cttaacaaa. 

Fig. 6. HCV-229E mRNA leader sequence compared to the leader sequence of IBV. The IBV leader extends for at least 16 nucleotides in the 
3'direction. 


to be sites of phosphorylation (Stohlman and Lai, 
1979). The relationship between the nucleocapsid 
genes of HCV-229E and TGEV is particularly interest¬ 
ing since the viruses are antigenically related (Mac- 
Naughton, 1981). The predicted molecular weights of 
the N protein and the number of potential phosphoryla¬ 
tion sites of both viruses are almost identical. Although 
these two viruses have little nucleotide sequence ho¬ 
mology between their nucleocapsid genes, the amino 
acid sequences are homologous within a limited re¬ 
gion. Amino acid sequence analysis revealed several 
structural features common to both viruses, which may 
have functional significance. For instance, there is a 


region of 46% homology within the amino-terminal 
one-third of the protein which extends from residues 
29 to 134 in HCV-229E, and 41 to 146 in TGEV. Fur¬ 
thermore, approximately 10 amino acids downstream 
from the homologous region in both proteins lies an 
area which is abundant in serine residues, suggesting 
that this may be an important functional domain of the 
molecule. To further examine such functional homol¬ 
ogy between the two proteins, hydropathy profiles 
were constructed (Fig. 7). The contour of these plots 
suggests that a certain degree of functional homology 
exists within the first and last one-third of each mole¬ 
cule, with an additional region around position 200. 
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Fig. 7. Hydropathy profiles of coronavirus N proteins. Both the K-D (solid line) and GES (dashed line) curves are depicted with scales on the 
right and left, respectively. 
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The peak around position 200 occurs just after the ser¬ 
ine-rich region of the molecule. The relative conserva¬ 
tion of these regions suggests a possible role in the 
interaction of the N protein with the viral genome. Sim¬ 
ilar structural features exist among the N proteins of 
HCV-229E, IBV, MHV, HCV-OC43, and BCV (Skinner 
and Siddell, 1984; Lapps ef a/., 1987; Kamahora eta!., 
1988; Boursnell ef a/., 1985). This is demonstrated by 
the hydropathy profiles of these proteins, which are 
also shown in Fig. 7. Further studies are required to 
reveal the functional significance of the conserved do¬ 
mains. 

Another interesting finding is the open reading frame 
internal to the main coding region of the FICV-229E N 
gene. Thus far, two other coronaviruses, BCV and 
MHV-JFIM, have been found to contain internal ORFs 
in gene 7 (Skinner and Siddell, 1984; Lapps etai, 1987) 
which are preceded by optimum translation initiation 
signals according to Kozak's consensus sequence 
(Kozak, 1983). The predicted amino acid sequences 
could encode hypothetical proteins of molecular 
weights 13,973; 14,842; and 23,057 for HCV-229E, 
MFIV-JHM, and BCV, respectively. Interestingly, all 
three sequences are abundant in leucine residues (17 
to 19%). F1CV-OC43 also has two smaller internal 
ORFs encoding potential leucine-rich proteins of 8830 
and 16,297 molecular weights (Kamahora ef a/., 1988). 
Further studies to determine whether this hypothetical 
protein can be detected in 229E-infected cells or by in 
vitro translation of a full-length cDNA clone (i.e., L8) are 
in progress. 

Finally, the 3'-noncoding conserved sequence of 
gene 7 lends additional support to a common ancestry 
for coronaviruses, regardless of antigenic subgroup. 
This sequence has been proposed as a recognition site 
for the virus-encoded RNA-dependent RNA polymer¬ 
ase prior to negative-strand synthesis (Kapke and 
Brian, 1986). Certainly future studies must focus on ex¬ 
amining the role of this conserved region in the viral 
replication cycle. 
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